Motif Discovery from Large Number of Sequences: A Case Study with Disease Resistance Genes in Arabidopsos thaliana

نویسندگان

  • Irfan Gunduz
  • Sihui Zhao
  • Mehmet M. Dalkilic
  • Sun Kim
چکیده

Motif discovery from a set of sequences is a very important problem in biology. Although a lot of research has been done on computational techniques for (sequence) motif discovery, discovering motifs in a large number of sequences still remains challenging. We propose a novel computational framework that combines multiple computational techniques such as pairwise sequence comparison, clustering, HMM based sequence search, motif finding, and block comparisons. We tested this computational framework in its ability to extract motifs from disease resistance genes and candidates in Arabidopsis thaliana genome and discovered all known motifs relating to disease resistance. When the same set of sequences was submitted to MEME and Pratt (motif discovery tools) as a whole without clustering, they failed to detect disease resistance gene motifs. The crucial component in this framework is clustering. Among the benefits of clustering is computational efficiency since the set of sequences are divided into smaller groups using a clustering algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification and characterization of a NBS–LRR class resistance gene analog in Pistacia atlantica subsp. Kurdica

P. atlantica subsp. Kurdica, with the local name of Baneh, is a wild medicinal plant which grows in Kurdistan, Iran.  The identification of resistance gene analogs holds great promise for the development of resistant cultivars. A PCR approach with degenerate primers designed according to conserved NBS-LRR (nucleotide binding site-leucine rich repeat) regions of known disease-resistance (R) gene...

متن کامل

Development of an Efficient Hybrid Method for Motif Discovery in DNA Sequences

This work presents a hybrid method for motif discovery in DNA sequences. The proposed method called SPSO-Lk, borrows the concept of Chebyshev polynomials and uses the stochastic local search to improve the performance of the basic PSO algorithm as a motif finder. The Chebyshev polynomial concept encourages us to use a linear combination of previously discovered velocities beyond that proposed b...

متن کامل

The symbiotic effect of Piriformospora indica on induced resistance against bakanae disease in rice (Oryza sativa L.)

The root endophytic fungus, Piriformospora indica, colonizes roots of a large number of plant species including Cereals and Brasicaceae. There are several reports indicating that P. indica protects roots from different path- ogens. In the present study, rice plants were pre-inoculated with P. indica and were subsequently infected with Fusarium proliferatum, as the causal agent of root rot and c...

متن کامل

Evaluation of Lamivudine Resistance Mutations in HBV/HIV Co-infected Patients

Background and Objective: The drug resistance mutations are key elements in the failure of long-term treatment of Hepatitis B virus (HBV) and human immunodeficiency virus (HIV) infections. The mutation in the YMDD motif in the P gene of HBV is the most critical factor in antiviral drug (especially lamivudine) resistance. This study aimed to assess the YMDD motif and other polymerase gene mutati...

متن کامل

The False Discovery Rate in Simultaneous Fisher and Adjusted Permutation Hypothesis Testing on Microarray Data

Background and Objectives: In recent years, new technologies have led to produce a large amount of data and in the field of biology, microarray technology has also dramatically developed. Meanwhile, the Fisher test is used to compare the control group with two or more experimental groups and also to detect the differentially expressed genes. In this study, the false discovery rate was investiga...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003